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Background and objectives: The severity of HIV-1 infection, measured by set-point viral load (SPVL), is 
highly variable between individuals. Its heritability between infections quantifies the control the patho- 
gen genotype has over disease severity. Heritability estimates vary widely between studies, but differ- 
ences in methods make comparison difficult. Phylogenetic comparative analysis offers measures of 
phylogenetic signal, but it is unclear how to interpret them in terms of the fraction of variance in SPVL 
controlled by the virus genotype. 

Methodology: We present computational methods which link statistics summarizing phylogenetic sig- 
nal to heritability, h 2 in order to test for and quantify it. We re-analyse data from Switzerland and 
Uganda, and apply it to new data from the Netherlands. We systematically compare established and 
new (e.g. phylogenetic pairs, PP) phylogenetic signal statistics. 

Results: Heritability estimates varied by method and dataset. Several methods were consistently able to 
detect simulated heritability above h 2 % 0.4, but none below. Pagel's X was the most robust and 
sensitive. The PP method found no heritability in the Netherlands data, whereas Pagel's X found sig- 
nificant heritability only in a narrow subdivision (P = 0.038). Heritability was estimated at h 2 = 0.52 (95% 
confidence interval 0.00-0.63). 

Conclusions and implications: This standardized measure, /i 2 , allows comparability of heritability 
between cohorts. We confirm high heritability in Swiss data, but neither in Ugandan data nor in the 
Netherlands, where it is barely significant or undetectable. Existing phylogenetic methods are ill-suited 
for detecting heritability below h 2 ~ 0.4, which may nonetheless be biologically important. 

KEYWORDS: HIV-1; heritability; phylogenetic comparative analysis; set-point viral load; 
phylogenetic signal; virulence 
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BACKGROUND AND OBJECTIVES 

HIV has a high mutation rate [1,2] and daily turnover 
[3, 4] and therefore adapts rapidly under local select- 
ive pressure from the immune system [5, 6, 7] or 
a nti retroviral drugs [8, 9]. Increasingly there is inter- 
est in the transmission of escape [10, 11] or drug 
resistance mutations [12], which may enable viral 
adaptation to the host population. The rate at which 
a trait evolves in response to natural selection is 
determined by its heritability. 

Recent work has suggested that virulence may 
also evolve at the population level [13] by natural 
selection towards a level optimal for transmission 
[14, 15]. Variation in virulence has a large impact on 
mortality and morbidity, so its evolutionary potential 
may present challenges or provide opportunities in 
public health interventions [16]. For example, vac- 
cines which reduce growth rate or toxicity are pre- 
dicted to reduce the costs of virulence, to the 
vaccinated host and the pathogen. This potentially 
raises the optimal virulence, resulting in poorer out- 
comes for the unvaccinated individuals [1 7]. 

Virulence in H IV is well approximated by set-point 
viral load (SPVL), which refers to the density of vir- 
ions in the blood during asymptomatic infection. 
SPVL is an early prognostic indicator for Al DS, as it 
varies by orders of magnitude between individuals, 
with high values having faster CD4 cell decline, pro- 
gressing more rapidly to AIDS and death [18, 19, 20]. 
However, it is relatively stable within the individual 
[21] meaningthat it can be measured ata wide range 
of time points in an individuals' infection [22]. 

Many host factors influence SPVL (Human 
Leukocyte Antigen (HLA) type (reviewed in [23])), 
sex [24], ethnicity [25], age [26], co-infections [27, 
28]). Specific human genetic markers have been 
identified to which 13% of SPVL variation can be 
attributed, with a further 9% explained by age, sex 
and population structure [29]. Recently, several 
studies have indicated that viral factors play a sub- 
stantial role in SPVL variation by measuring its her- 
itability between infections (reviewed in [30]). Most 
of these quantify the similarity in SPVL within trans- 
mission pairs, which are sexual couples in which one 
has infected the other [31-34]. 

The phenotype of any organism is controlled 
partly by its genome, and partly by its environment. 
Throughout this work we define heritability, h 2 , in the 
broad sense as the proportion of total phenotypic 
variance (<7p) ascribed to genetic variance (oq) 
[equation (1), [35]]. In the environmental component 



of variance, o\, we conceptually include all host gen- 
etic and non-genetic effects, as well as interactions 
between host and virus genotype. 



Alizon et al. [36] used a phylogenetic comparative 
approach to identify phylogenetic signal as a meas- 
ure of heritability, without requiring behavioural 
data. Phylogenetic signal is the extent to which indi- 
viduals with similar traits can be observed to cluster 
together on the phylogeny. This approach has the 
advantage that any sample of well-characterized pa- 
tients could be analysed in this way. However, the 
authors did not account for cofactors such as age 
and co-infections, which influence the SPVLand may 
clustertogetheron the phylogeny. It is also uncertain 
exactly how the quantity measured by the two 
methods used (PagePs X and Blomberg's K) should 
be interpreted in the context of heritability. We 
herein propose a new approach which also uses only 
the phylogenetic relationships to determine genetic 
proximity of viruses, but additionally links the results 
to true heritability, and in some cases allows inclu- 
sion of the effect of cofactors on SPVL. 

The aims of this study were to evaluate the phylo- 
genetic approach for estimating heritability, to com- 
pare the efficacy of the various statistics available for 
quantifying heritability on simulated and real data, to 
use these methods to confirm the presence of herit- 
ability in previously analysed data, and to measure 
heritability in a dataset from the Netherlands which 
has not been previously analysed for this purpose. 

METHODOLOGY 

Data 

Data from Rakai, Uganda 

The study population was enrolled in the Rakai 
Community Cohort Study in the rural Rakai District 
of south-western Uganda. The study methods for 
this cohort have been outlined elsewhere [34, 37, 
38]. We used those individuals who were sampled 
at a single time point in April 1995 (tt = 332). 
Characteristics of the cohort are shown in 
Supplementary Table SI . Serum samples were used 
to measure SPVL, with most individuals providing a 
single measurement. 

The phylogenetic analysis was conducted in 
RAxML [39] using the General Time Reversible 
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substitution model [40], the T model of rate hetero- 
geneity with four distinct rate categories and a pro- 
portion of invariant sites [41] (GTR+r+l), which had 
the best score when the alignment was analysed with 
ModelTest [42]. The alignment was also analysed 
using the Recombination Analysis Tool (RAT [43]), 
which found no apparent recombination within the 
studied genes (data not shown). 

A Simian Immunodeficiency Virus (SIV) outgroup 
was included (accession number: AB1 77846.1). The 
phylogeny was constructed from either the gp41 
(ew) or p24 [gag) sequences (Supplementary Figs 
SI and S2), which were analysed separately because 
they have different substitution rates [44]. We also 
performed the phylogenetic and heritability analysis 
on the subtype A and D sequences separately, due to 
apparent imbalance in the joint trees. 

Data from Switzerland 

The Swiss data were taken from the Swiss HIV 
Cohort Study and its integrated genotypic drug re- 
sistance database which has been described else- 
where [45, 46], and the specific selection has been 
analysed in previous work [36, 47, 48]. The phyl- 
ogeny, constructed from the reverse transcriptase 
and protease components of the pol gene 
(Supplementary Fig. S3), had been inferred in this 
previous work [36] using PhyML [49] from subtype B 
infected individuals forwhom at least three viral load 
measurements were available after primary infection 
and before anti-retroviral therapy (r? = 661). 
Characteristics of the cohort are shown in 
Supplementary Table S2. 

Note that PhyML was used for the Swiss data, 
compared with RAxML in the other two datasets. 
However, we expect the methods to produce very 
similar results as both use a hill-climbing algorithm, 
likelihood scores are well matched when using the 
same nucleotide substitution model, and there is no 
evidence for systematic differences in the results [50]. 

We analysed four subdivisions of the data as in 
previous work [36]. The entire dataset ('All') fit the 
Liberal definition for viral load variability, meaning 
that at least three consecutive viral load measure- 
ments within the asymptomatic window (6-36 
months after first positive viral RNA) remained 
within a one-log band of one another. The 'Strict' 
subdivision included only those individuals for 
whom the measurements in the asymptomatic win- 
dowall sat within the one-log band. Excluding all but 
men who have sexwith men (MSM) led tothefurther 
subdivisions, 'MSM' and 'MSM Strict'. 



Data from Netherlands 

The study cohort was provided by the ATHENA na- 
tional observational cohort (seroconverters from 
1996 or onwards) and the Amsterdam Cohort 
Studies (seroconverters before 1996) [51]. We 
included only individuals infected with subtype B 
between 1985 and 2008, forwhom appropriate gen- 
etic data and SPVL were available [52] (a = 416). 

We define SPVL as the mean of all log 10 viral load 
measurements taken 6-24 months after the mid- 
point between the last negative and the first positive 
diagnosis. The phylogeny was reconstructed using a 
sequence of length 2064 containing the same elem- 
ents of the pol gene as for the Swiss data. 

We excluded codons strongly associated with 
drug resistance mutations (for details, see 
Supplementary Fig. S4). Four subtype C individuals 
from the same cohort were used as the outgroup. 
The phylogenetic analysis was performed using 
ModelTest [42], RAxM L [39] and RAT [43] in the same 
way as the Rakai cohort, which also identified the 
same nucleotide substitution model as appropriate, 
and no recombination was detected (data not 
shown). 

As with the Swiss data, individuals were 
categorized as 'All', 'Strict', 'MSM' and 'MSM 
Strict'. Additionally, two further categories, 'MSM 
NL' and 'MSM Strict NL', were created from the 
'MSM' groups, which excluded individuals not 
originating in the Netherlands in order to further 
reduce confounding factors. 

Trees for simulations were read and manipulated 
using the ape package [53] in R [54], which was also 
used to plot the trees. 

Methods for calculating heritability statistics 

A phylogeny, reconstructed from genetic data, is an 
approximation of the transmission network. 
Phylogenetic signal is a measure of how well trait 
values at the tree tips match their relative positions 
on the phylogeny, and several established methods 
are available to quantify this signal in terms of a 
single statistic: the Mantel test [55]; Blomberg's in- 
dependent contrasts, which give us the Blomberg's 
K and PICv (variance of phylogenetic independent 
contrasts) statistics [56]; Pagel's X transformation 
[57]; and the Abouheif-Moran (AM) tests [58], of 
which there are five variants ('oriAbouheif , 
'sumDD', 'nNodes', 'patristic', 'Abouheif , with the 
latter used by default). We also developed two new 
methods which allow control of cofactors, the 



212 | Shirreffet al. 



Evolution, Medicine, and Public Health 



phylogenetic pairs (PP) method and the hierarchical 
clustering (HC) method, which are described briefly 
here. 

The PP method identifies pairs of individuals on 
the tree which are each other's closest neighbour, 
and these are assumed to be transmission pairs. 
Analysis of variance (ANOVA) identifies the degree 
to which the transmission partner explains an indi- 
vidual's SPVL. Crucially, the ANOVA approach 
allows for the inclusion of cofactors. These are age, 
sex and genital ulcer disease in the Rakai dataset 
(Supplementary Table SI); age, sex and risk group 
in the Swiss dataset (Supplementary Table S2); and 
age, sex, risk group, region of origin and the type of 
assay used to measure viral load in the Netherlands 
dataset (Supplementary Table S3). This method also 
ignores individuals who are not part of a phylogen- 
etic pair. 

The HC method is similar but considers larger 
clusters of individuals identified on the phylogeny 
by a threshold branch length, and examines the 
amount of variance in SPVL explained by the cluster. 
Because there is no intuitive ideal cluster size, pro- 
portion included or number of clusters to use, the 
method integrates over the range of clustering 
distances. 

All established and new methods are described in 
detail in the Supplementary data. 

Randomization test 

The significance of a test statistic can be measured 
as in [56] by comparing the statistic derived from the 
data with its distribution under no heritability, which 
is the null hypothesis. Randomizing the tips of the 
tree by randomly reallocating the tips scrambles any 
heritability signal. This is performed 1 OOOtimes, and 
the analysis is repeated for each. The proportion of 
randomized datasets that give a statistic higherthan 
the true value is the one-tailed P-value for presence 
of heritability. When the randomization test was per- 
formed for the PP and HC statistics the cofactors 
remained with their corresponding SPVL data. 

Method of simulating SPVL data on a known 
phylogeny 

This method uses a simple algorithm to simulate 
evolution ofa continuous trait on a known phylogeny 
for a given heritability. This method is similar to that 
used in previous work [36] and is a variation on the 
Ornstein-Uhlenbeck process [59, 60], which allows 



Brownian motion to occur while constraining the 
distribution of the population. 

During the simulation, each node on the tree is 
assigned a trait value in log 10 SPVL, beginning from 
the root, which is assigned the mean ofthe true SPVL 
data. Each daughter node is given a SPVL value de- 
pending on that of its parent, and on the SPVL dis- 
tribution in the whole population. The higher the 
heritability, the more the value depends on the 
parent. 

The SPVL at each subsequent daughter node, 
V D [equation (2)], is derived from the parent 
node, V P , the value h 2 , and the random variable 
M which is normally distributed according to the 
mean and variance of the population log 10 SPVL 
(distribution 3). 

V D = h 2 Vp+M^/l - (h 2 ) 2 , (2) 

M ~N(jl,cr 2 ). (3) 

The value h 2 is therefore the regression slope be- 
tween the trait values at a parent node, V P , and a 
daughter node, V D (or an index and secondary case). 
This has been demonstrated to be equal to the broad 
sense heritability, h 2 as defined in equation (1) [61], 
and its further implications are discussed in the 
Supplementary data. The result is a set of data at 
the tips ofthe tree, simulated with known heritability 
ff 2 , which is used for analysis. We have also explored 
an alternative method of simulation which allows for 
multiple transmission between nodes on the tree, 
which is also described in the Supplementary data. 

Multiple hypothesis testing to estimating true 
heritability and confidence intervals 

For each value of h 2 between 0 and 1 , in increments 
of 0.01 , 1 00 simulations are performed and the rele- 
vant phylogenetic comparative statistics is 
calculated. A hypothesis test for the particular h 2 is 
then performed with these 100 values. They are 
compared with the statistic calculated from the true 
data, and the proportion which is lower than the 
true statistic becomes the probability P that 
the data are consistent with that value of h 2 . The 
values of h 2 which produce P-values closest to 
0.025, 0.975 and 0.500 become the lower and upper 
95% confidence bounds, and the median estimate of 
h 2 , respectively. 

A visual distribution of h 2 is estimated using 
approximate Bayesian computation (ABC), of which 
details are given in the Supplementary data. 
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Estimate power to detect an effect 

To examine the ability of each statistic to detect non- 
null heritability, we simulated 100 datasets for each 
value of a range (0-1) of h 2 values in increments of 
0.05. To visualize the relationship between h 2 and the 
statistic of interest, we calculated the mean and 
standard deviation of the phylogenetic comparative 
statistics calculated from these 100 simulations. In 
order to estimate the power of each statistic to detect 
a significant effect at each value of f? 2 , we performed a 
randomization test on each simulated dataset, as 
described above (see 'Randomization test'), but with 
1 00 randomizations. For each simulated dataset, the 
proportion of randomization tests which find signifi- 
cant (P < 0.05) heritability represents the power to 
detect an effect at that value of h 2 . 

All methods are available in R code on request 
from the corresponding author. 

RESULTS 

Testing significance in previously analysed data 

The significance of heritability from the 
randomization test for the Rakai and Swiss data is 
shown in Tables 1 and 2, respectively. 

A consistent signal was not found in the Rakai 
data (Table 1), but in the Swiss data it was more 
so (Table 2), and was higher for the smaller subdiv- 
isions. We tested the possibility that high signifi- 
cance in the smallest group was an artefact of 
small tree size, but this was not the case 
(Supplementary Table S5). We also tested the ro- 
bustness of the results to phylogenetic uncertainty 
and found that small levels of perturbation (3%) 
have little effect on the results, but as the level of 
perturbation increases, the signal weakens accord- 
ingly (Supplementary Table S6). 

Estimates of heritability in previously analysed 
data 

Tables 1 and 2 show the medians and confidence 
intervals of h 2 as estimated by multiple hypothesis 
testing (MHT). In the Swiss data, the confidence 
intervals exclude zero in several cases. In all ofthese 
instances, the randomization test was also signifi- 
cant. However, a significant randomization test did 
not always correspond to confidence intervals which 
excluded zero, because these are one- and two-tailed 
tests, respectively. 



The distribution of true heritability was calculated 
by ABC using both PP (red) and Pagel's X (blue) for 
each subdivision in the Swiss and Rakai datasets 
(Figs 1 and 2). In the Swiss figure, the MSM Strict 
subdivision exhibits the distribution of h 2 most 
removed from zero, which supports the results from 
the randomization test and lower confidence 
bounds (Table 2). The method using Pagel's X 
exhibits higher heritability in the larger subdivisions. 

Similarly in the Rakai cohort, the most significant 
result from the randomization test (Table 1) is the 
p24 subdivision analysed under Pagel's A, and this is 
reflected in the results from ABC, in which the dis- 
tribution of estimated h 2 is highly positive. 
Interestingly, the PP method finds that the most sig- 
nificant distribution visually is instead from the gp41 
subdivision which is also confirmed by the 
randomization test. 

We also tested an alternative method of simula- 
tion which accounted for differences in branch 
length and allowed multiple generations between 
two adjacent nodes. We applied this to the Rakai 
p24 subtype A data (Supplementary Table S7), and 
the alternative gave higher estimates of h 2 but wider 
confidence intervals. 

Heritability was also measured in the Rakai and 
Swiss data using a phylogenetic mixed model, which 
assumes that the trait is determined by independent 
viral and host effects [62]. These results are given in 
Supplementary Table S4. 

Testing the sensitivity of each statistic 

The sensitivity of each phylogenetic comparative 
statistic to simulated heritability was explored by 
visualizing the relationship between them, and 
measuring the power of the statistic to successfully 
detect an effect of that size, in the entire Swiss 
dataset and the MSM Strict subdivision (Figs 3 and 
4, respectively). None of the statistics had substan- 
tial power to detect heritability below h 2 = 0.4 on any 
subdivision, but most had the power to detect an 
effect above that level. The Mantel test performed 
poorly throughout. Blomberg's K performed well in 
the MSM Strict subdivision, but was insensitive 
when applied to the entire Swiss dataset, with only 
values above h 2 = 0.6 being consistently detected. In 
the entire Swiss dataset, the AM statistic appeared 
the most sensitive, but in the MSM Strict subdiv- 
ision its performance matched that of Pagel's X. 

We also tested the power of variants ofthe HCand 
AM tests compared with their default methods 
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99 Table 2. The statistics (Z), P-values from a randomization test, medians and confidence intervals of h 2 
from MHT on the Swiss data 



Type of test All (n = 661) Strict (n = 230) MSM (n = 404) MSM Strict (n = 134) 





Z 


P 


U 2 

n 


90% LI 


Z 


P 


In 2 

h 


90% LI 


Z 


P 


u 2 

n 


95% CI 


Z 


P 


u 2 
h 


90% LI 


pp 


-0.089 


0.817 


0.08 


(0.00, 


0.376 


0.020 


0.61 


(0.01 , 


0.009 


0.492 


0.18 


(0.00, 


0.595 


0.005 


0.68 


(0.12, 










0.48) 








0.77) 








0.53) 








0.82) 


HC complete 


0.008 


0.214 


n.d. 


n.d. 


0.045 


0.038 


n.d. 


n.d. 


0.020 


0.096 


n.d. 


n.d. 


0.049 


0.032 


n.d. 


n.d. 


HC single 


0.006 


0.159 


n.d. 


n.d. 


0.030 


0.038 


n.d. 


n.d. 


0.008 


0.175 


n.d. 


n.d. 


0.034 


0.040 


n.d. 


n.d. 


HC average 


0.004 


0.315 


n.d. 


n.d. 


0.039 


0.043 


n.d. 


n.d. 


0.016 


0.113 


n.d. 


n.d. 


0.046 


0.024 


n.d. 


n.d. 


HC ward 


0.008 


0.315 


0.36 


(0.00, 


0.060 


0.020 


0.58 


(0.28, 


0.036 


0.036 


0.44 


(0.00, 


0.063 


0.023 


0.62 


(0.34, 










0.52) 








0.72) 








0.61) 








0.73) 


Mantel 


-0.014 


0.712 


0.00 


(0.00, 


0.040 


0.201 


0.91 


(0.00, 


-0.005 


0.516 


0.00 


(0.00, 


0.050 


0.207 


0.79 


(0.00, 










0.80) 








0.97) 








0.76) 








0.91) 


Blomberg K 


0.002 


0.305 


0.50 


(0.00, 


0.025 


0.366 


0.44 


(0.00, 


0.091 


0.038 


0.71 


(0.00, 


0.593 


0.080 


0.42 


(0.00, 










0.67) 








0.72) 








0.82) 








0.54) 


PICv 


1594 


0.307 


n.d. 


n.d. 


127 


0.345 


n.d. 


n.d. 


40.2 


0.052 


n.d. 


n.d. 


5.58 


0.002 


n.d. 


n.d. 


Pagel X 


0.105 


0.016 


0.44 


(0.16, 


0.216 


0.030 


0.44 


(0.00, 


0.153 


0.025 


0.45 


(0.00, 


0.646 


0.016 


0.54 


(0.32, 










0.57) 








0.57) 








0.59) 








0.63) 


AM oriAbouheif 


-0.013 


0.680 


n.d. 


n.d. 


0.016 


0.361 


n.d. 


n.d. 


0.038 


0.141 


n.d. 


n.d. 


0.127 


0.023 


n.d. 


n.d. 


AM Abouheif 


-0.014 


0.671 


0.01 


(0.00, 


0.015 


0.338 


0.24 


(0.00, 


0.037 


0.138 


0.31 


(0.00, 


0.126 


0.019 


0.49 


(0.03, 










0.32) 








0.49) 








0.48) 








0.61) 


AM sumDD 


0.000 


0.166 


n.d. 


n.d. 


0.001 


0.168 


n.d. 


n.d. 


0.001 


0.136 


n.d. 


n.d. 


0.013 
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Individuals were subdivided according to variability of viral load (Strict), risk category (MSM) or both. P-values showing borderline significance (P<0.1) 
are in blue, and formal significance (P<0.05) is in red, and confidence intervals in which the lower limit is above zero are also in red. n.d., not done. 



('ward' and 'Abouheif, respectively) 
(Supplementary Figs S5 and S6). The default 
methods performed equally well or better than any 
of the variations. 

Application to data from the Netherlands 

Supplementary Figure S4 shows the shape of the 
phylogeny inferred from the Netherlands data. 
These data have not been previously analysed for 
heritability, and so to reduce the problem of multiple 
testing only two methods, Pagel's X and PP, were 
used to detect and measure heritability in each sub- 
division of the data. 

Significant heritability was found only in one 
subdivision (MSM from the Netherlands 
with Strict viral loads) and using one statistic 
(Pagel's A), which gave a heritability estimate of 
h 2 = 0.S2 (Table 3). No effect was found when the 
PP method was used. None of the confidence inter- 
vals on h 2 excluded 0. The estimated distributions of 
h 2 are shown in Fig. 5. 



DISCUSSION 

In this study, we developed new methods for detect- 
ing and measuring heritability of SPVL using phylo- 
genetic relationships and compared them with 
established methods on real and simulated data. 



Which cohorts exhibit heritability? 

Heritability was detected consistently in the MSM 
Strict subdivision ofthe Swiss cohort (Table 2), sup- 
portingthe previous study ofthese data which found 
significant and high heritability in that subdivision 
[36]. Note that in that study, the level of significance 
was detected by randomizing the PICv rather than 
the K statistic, hence the strong result for PICv in the 
MSM Strict subdivision. We also found that several 
statistics uncovered significant heritability in all of 
the subdivisions of the Swiss data. In many such 
cases, the confidence intervals of h 2 also excluded 
zero, and the estimates for h 2 were high (0.44-0.68). 
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Figure 1. Heritability estimated by ABC in all subdivisions of the Swiss data. The subdivision is written above the plot. Results 
from the PP method are in red, and from PagePs X in blue, with the overlap in purple 



In spite of significant heritability being found 
in previous work on the Rakai data [34], we did 
not consistently find heritability in this cohort. 
Although they are not strictly comparable, the 
results from the previous studies of the Rakai 
and Swiss data suggest that heritability is higher 
in the latter cohort, and this may underlie the 
differences in our results. However, the best esti- 
mate for h 2 from the Rakai data was high (0.51) 
(Table 1). 

Six different subdivisions of the Netherlands data 
were analysed using PP and PagePs A, and only a 
single positive result was found with one statistic 
(A.) in one subdivision, suggesting that heritability 
is close to the detection borderline. It was estimated 
at f? 2 = 0.52 (Table 3), which falls high within the 
range of previous estimates [30]. 

It is interesting that the Swiss cohort shows more 
apparent heritability than the Netherlands in spite of 
a similar transmission routes and genetic back- 
grounds. One potential explanation would be differ- 
ences in coverage. The prevalence in Switzerland is 
higher (0.4% compared with 0.2%) [63], but so is the 



sample size (661 compared with 416) which sug- 
gests similar levels of coverage, which are reportedly 
high in both cohorts [47, 64]. In the Swiss dataset, 
the mean SPVL of the Strict group is significantly 
higher than that of the rest (P= 0.0004), and the 
same is true of the MSM over non-MSM 
(P = 0.003) [36]. The same is not true of these two 
categories in the Netherlands cohort (data not 
shown), which suggests that viral virulence geno- 
types are less structured in this cohort and may 
explain lower heritability. 

The exclusion of non-Strict individuals generally 
increased the level of significance, but this was 
dependent on the cohort and the method. In the 
Swiss cohort, the exclusion increased significance 
when using the PP and HC methods, but not other- 
wise (Table 2). In the Netherlands cohort, the exclu- 
sion increased significance markedly when using 
Pagel's A, but not as much using PP (Table 3). 
Understanding why the Strict population exhibits 
higher heritability than whole sample may be an im- 
portant step in both estimating and understanding 
the mechanisms behind heritability. Simple models 
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Figure 2. Heritability estimated by ABC in all subdivisions of the Rakai data. The subdivision is written above the plot. Results 
from the PP method are in red, and from Pagel's X in blue, with the overlap in purple 



fitted to longitudinal viral load data within patients 
suggest that fluctuations do not just represent ran- 
dom noise [65]. Fluctuating viral load has been 
associated with untreated sexually transmitted co- 
infections [66] and this or other host-mediated effect 
may mask the effect of the viral genotype and justify 
the exclusion of individuals with highly variable viral 
loads. 

The same gene pol was used to analyse the Swiss 
and Netherlands datasets, whereas the available 
genes from the Rakai cohort were env and gag. It 
has been shown that env and pol produce similar 



trees [67], but there are topological differences, 
and our work also demonstrates different results be- 
tween gag and env (Fig. 2) . The ABC method of esti- 
mation by simulation makes estimation of h 2 robust 
to differences in gene usage and we do not expect 
systematic differences between the cohorts. There 
are also differences in transmission routes between 
the cohorts, and a higher diversity of HLA types in 
African than European populations [68]. Although 
these differences inhibit direct comparability be- 
tween the European and Ugandan datasets, they 
add to the general nature of this work. 



218 | Shirreffet al. 



Evolution, Medicine, and Public Health 




0.0 0.2 0.4 0.6 0.8 1.0 




0.0 0.2 0.4 0.6 0.8 1.0 



CD ■* 
CO o 
Q_ cv 



0.0 0.2 0.4 0.6 



0.0 0.2 0.4 0.6 




0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 




Figure 3. Comparison ofthe sensitivity ofvarious statistics to 
heritability on the entire Swiss phylogeny. (Left side) The rela- 
tionship between heritability and six different statistics under 
comparison. The circles and bars represent the mean and 
standard deviation of the sample. (Right side) The power of 
each statistic to detect heritability at 5% significance. The bars 
represent the standard deviation of the proportion 

Which method is best for detecting heritability? 

The principle test of these statistics is the detection 
of an effect in real data. Pagel's X detected heritabil- 
ity in every subdivision of the Swiss data (Table 2), 
and also produced the strongest result of any statis- 
tic applied to the Rakai dataset (Table 1). The PPand 
HC statistics performed well on the Swiss data, par- 
ticularly on the Strict subdivisions. The AM statistics 
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Figure 4. Comparison ofthe sensitivity ofvarious statistics 
on the Swiss MSM Strict phylogeny. Left and right side as in 
Figure 3 



were less successful at detecting an effect in the 
Swiss data. In the Rakai data, the AM 'patristic' vari- 
ant found a significant result in two subdivisions. 
However, in simulation studies it performed very 
poorly (Supplementary Fig. S6). 

Testing the detection power by simulation relies 
on a simple model of trait evolution, but has the 
advantage that heritability is known. It revealed that 
the PP, HC, Pagel's X and AM statistics were com- 
parably sensitive, detecting an effect at greater than 
approximately 0.4 heritability in the Swiss data, with 
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Table 3. The statistics (Z), P-values from a randomization test, medians and confidence intervals of h 2 
from MHT on the Netherlands data, when analyzed with PP and Pagel's X 
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0.0001 


0.0978 


0.0001 


0.0323 


0.0001 


0.2628 


P-value 


0.824 


0.086 


0.827 


0.200 


0.718 


0.038 


h 2 


0.19 
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0.52 


95% CI 
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(0.00, 0.60) 
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Individuals were subdivided according to variability of viral load (Strict), risk category (MSM) or whether they were from the Netherlands (NL). P-values 
showing borderline significance (P<0.1) are in blue, and formal significance (P<0.05) is in red, and confidence intervals in which the lower limit is 
above zero are also in red. 



AM slightly more sensitive (Figs 3 and 4). The K 
statistic was as sensitive as other statistics when 
applied to the MSM Strict subdivision, but in the 
entire Swiss dataset its sensitivity was low. 

Simulations suggest that none of the methods 
can detect heritability lower than approximately 
/? 2 = 0.4, and this threshold is higher in some 
phylogenies. This threshold is confirmed by the 
finding that estimates of h 2 are always above 0.4 
when heritability was found to be significant. Most 
studies have estimated lower heritability than this 
[30], and previous modelling work has suggested 
that such low heritability is enough to produce a 
substantial rate of evolution [1 5]. This suggests that 
phylogenetic methods are not adequate to exclude 
the possibility of relevant heritability in HIV viru- 
lence in these datasets. 

Interestingly, another study which took an analyt- 
ical and computational approach to comparing be- 
tween Blomberg's K, the AM and the Mantel test 
found that K had a higher power to detect an effect 
than the AM statistic [69]. They also argue that these 
tests should all give the same significance as they are 
based on the cross-product of a phylogenetic simi- 
larity and trait similarity matrix. In contrast, we 
found marked differences between their perform- 
ances, with the AM, K and Mantel statistic having 
decreasing power to detect an effect. They found that 
the sensitivity of these methods was dependent on 
the shape of the phylogeny, so differences in the 
source oftrees (simulated versus inferred) are a pos- 
sible source of discrepancy in the respective studies. 



This is beyond the scope ofthis study but deserves to 
be the subject of future work. 

The PP, along with the HC methods, is able to 
account for cofactors of SPVL: ignoring these may 
lead to overestimates or underestimates of heritabil- 
ity if they associate or dissociate on the tree, respect- 
ively. Indeed, an increase in signal when cofactors 
were taken into account was seen in previous ana- 
lysis of transmission pairs in the Rakai cohort [34]. 
However, in the absence of a method to include the 
effect of cofactors in simulations, this aspect of the 
PP or HC methods cannot be harnessed to measure 
h 2 . One possible method would be to calculate the 
fixed effects of the cofactors using a linear regres- 
sion, simulate only the residuals, and subsequently 
add the fixed effects, but this requires treating re- 
siduals as data, which is inappropriate in the likely 
event that there is correlation between the effects of 
cofactors and the SPVL [70]. The PP method pro- 
duces a dataset of couples, which is analogous to 
other couples studies [31-34]. The lack of sensitivity 
of the PP method and the other phylogenetic 
methods suggests that the phylogeny cannot (yet) 
tell us everything that the epidemiology does about 
the transmission network. 

The X statistic has the advantage that it incorpor- 
ates both topology and branch lengths, and analyses 
the entire sample. It is notable, therefore, thatthe PP 
method is relatively successful in spite of its 
analysing only a subset of individuals who form ap- 
parent transmission pairs (60%), and in particular 
ignores deep relationships within the phylogeny. 
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Figure 5. Heritability estimated by ABC in all subdivisions of the Netherlands data. The subdivision is written above the plot. 
Results from the PP method are in red, and from Pagel's k in blue, with the overlap in purple 



This suggests that most signal lies in the recent 
phylogenetic relationships. However, in unpub- 
lished work, Hodcroft et al. found SPVL heritability 
using pedigree analysis on UK data [71]. In contrast 
to our work, they found that collapsing poorly sup- 
ported nodes in the tree and thereby ignoring some 
of the shallow relationships in the phylogeny had a 
negligible effect on their results for some datasets. 
The AM method is also successful, which (with the 
exception of the 'patristic' variant) ignores branch 
lengths, indicating that topology may be more im- 
portant. Rigorously identifying which clades or levels 
of the phylogeny are responsible for heritability 



would bean interesting direction for future research. 
This may differ for phylogenies which are less star- 
like than HIV. The use of longer sequences and bet- 
ter sampled datasets is likely to result in better 
detection and estimation of heritability, as poorly 
resolved trees scramble the heritability signal. 
However, the detection threshold is unlikely to 
change even with improved sampling, as it was 
based on simulations which were blind to uncer- 
tainty in the tree. 

It is noteworthy that the PP and Pagel's X each 
have their strengths in estimating the distributions 
of h 2 in the different Rakai subdivisions. The PP 
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method identifies a more strongly positive distribu- 
tion in subdivision gp41, which Pagel's X does not, 
but the latter detects a more positive distribution in 
p24 and subtype A gp4h This suggests that an ap- 
proach which combines these methods may be 
appropriate. 

CONCLUSIONS AND IMPLICATIONS 

In this study, we compare several phylogenetic com- 
parative methods to detect heritability, h 2 . Many 
methods detect heritability successfully in real and 
simulated data, but sensitivity drops off below 
h 2 = 0A. We recommend the PP method and 
Pagel's X for use in detecting and estimating herit- 
ability, the former for its consideration of co-factors, 
and the latter for its marginally higher level of 
sensitivity. 

Estimates of heritability were consistent with pre- 
vious studies on the Rakai and Swiss data, and con- 
firm that heritability can be very high, which has 
clinical and evolutionary implications. When applied 
to the Netherlands data, heritability was found only 
in the most homogeneous subdivision, MSM who 
originate in the Netherlands with Strict viral 
loads. Differences in heritability between cohorts, 
subdivisions and methods for estimation carry 
implications for the biology of heritability, 
which offer interesting avenues for future modelling 
work. Experimental and epidemiological research 
are also required to directly identify viral factors 
which contribute to variance in SPVL, as well as 
exploring the impact of treatment on virulence 
evolution. 



SUPPLEMENTARY DATA 

Supplementary data are available at EMPH online. 
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